Group 26 - Assignment 5

Carl Felix Freiesleben - s203521
Emilie Munk - s203538
Josefine Løken - s183784
Judith Tierno Martí - s222869
Sahand Yazdani - s203538

Intro

The analysis was performed on the dataset: Right Heart Catheterization (RHC) Dataset, first analysed Connors (et. al) (1996)

  • It focuses on the effect RHC has on the patients

  • Used propensity score matching to create an artificial control group

  • Their study found that patients undergoing RHC experienced shorter survival times.

  • Attribute datta includes patient demographics, socioeconomic details, physiological parameters, disease-related information, and survival outcomes.

Materials and Methods

We performed our analysis using \(\color{red}{\text{Tidyverse}}\).

Before cleaning and augmentation:

  • 5735 patients

  • 62 attributes

After cleaning and augmentation:

  • 5612 patients

  • 53 attributes

Materials and Methods (exploratory dataviz)

Familiarize ourselves with the data by extracting different information about the attributes and made numerous plots

Created summaries of different attributes, to find what makes sense to analyse

# A tibble: 4 × 3
# Groups:   swang1 [2]
  swang1 death     n
   <dbl> <dbl> <int>
1      0     0  1291
2      0     1  2177
3      1     0   681
4      1     1  1463

We used histograms because they are easy to read and interperate, while also showing a lot of information

*insert 2 small images as examples

Table 1

rhc_aug |> mutate(sex = factor(sex),
                    swang1 = factor(swang1),
                    death = factor(x = death, levels = c(0,1), c("Alive","Dead"))) |> 
  table1(x = formula(~ sex + age + race + swang1 | death),
         data = _)
Alive
(N=1972)
Dead
(N=3640)
Overall
(N=5612)
sex
Female 906 (45.9%) 1594 (43.8%) 2500 (44.5%)
Male 1066 (54.1%) 2046 (56.2%) 3112 (55.5%)
age
Mean (SD) 56.6 (17.4) 64.0 (15.7) 61.4 (16.7)
Median [Min, Max] 58.0 [18.0, 102] 66.0 [18.0, 101] 64.0 [18.0, 102]
race
black 323 (16.4%) 577 (15.9%) 900 (16.0%)
other 121 (6.1%) 223 (6.1%) 344 (6.1%)
white 1528 (77.5%) 2840 (78.0%) 4368 (77.8%)
swang1
0 1291 (65.5%) 2177 (59.8%) 3468 (61.8%)
1 681 (34.5%) 1463 (40.2%) 2144 (38.2%)

Investigating the mean blood pressure with different diseases

  • Bimodal distributions
  • Seems that mean blood pressure is higher in patients without RHC

How Your Medical Insurance Influences Your Survival Chances

  • Most patients have low income.
  • Patients are mostly covered by Medicare or Private medical assurance.
  • Individuals in lower income categories have a higher mortality.
  • Individuals covered by Medicare have the highest mortality.

PCA

Modelling

More beautiful plots by Emilie

Discussion

How come we found no major discoveries?

What could have been done differently?

Conclusion

We can conclude that PC can make sense for further analysis.

We can conclude that high values of APS for several diagnosis, will increase the risk of death